Research on Robust Audio-Visual Speech Recognition Algorithms

نویسندگان

چکیده

Automatic speech recognition (ASR) that relies on audio input suffers from significant degradation in noisy conditions and is particularly vulnerable to interference. However, video recordings of capture both visual signals, providing a potent source information for training models. Audiovisual (AVSR) systems enhance the robustness ASR by incorporating lip movements associated sound production addition auditory input. There are many audiovisual models transcription, but most them have been tested based single experimental setting with limited dataset. good model should be applicable any scenario. Our main contributions are: (i) Reproducing three best-performing current AVSR research area using famous databases, LSR2 (Lip Reading Sentences 2) LSR3 3), comparing analyzing their performances under various noise conditions. (ii) Based our experiences, we analyzed problems currently encountered domain, which summarized as feature-extraction problem domain-generalization problem. (iii) According results, Moco (momentum contrast) + word2vec (word vector) has best effect LRS datasets regardless whether there or not. Additionally, also produced results experiments recognition. lays foundation further improving performance

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Continuous Audio-visual Speech Recognition Continuous Audio-visual Speech Recognition

We address the problem of robust lip tracking, visual speech feature extraction, and sensor integration for audiovisual speech recognition applications. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We tackle the problem of joint temporal model...

متن کامل

Audio - Visual Speech Recognition

We have made signi cant progress in automatic speech recognition (ASR) for well-de ned applications like dictation and medium vocabulary transaction processing tasks in relatively controlled environments. However, for ASR to approach human levels of performance and for speech to become a truly pervasive user interface, we need novel, nontraditional approaches that have the potential of yielding...

متن کامل

Noise-based audio-visual fusion for robust speech recognition

A major goal of current speech recognition research is to improve the robustness of recognition systems used in noisy environments. Recent strides in computing technology have allowed consideration of systems that use visual information to augment the decision capability of the recognizer, allowing superior performance in these difficult environments. A crucial area of research in audiovisual s...

متن کامل

A robust audio-visual speech recognition using audio-visual voice activity detection

This paper proposes a novel speech recognition method combining Audio-Visual Voice Activity Detection (AVVAD) and Audio-Visual Automatic Speech Recognition (AVASR). AVASR has been developed to enhance the robustness of ASR in noisy environments, using visual information in addition to acoustic features. Similarly, AVVAD increases the precision of VAD in noisy conditions, which detects presence ...

متن کامل

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition

In this paper, we investigate audio-visual interaction in sparse representation to obtain robust features for audio-visual speech recognition. Firstly, we introduce our system which uses sparse representation method for noise robust audio-visual speech recognition. Then, we introduce the dictionary matrix used in this paper, and consider the construction of audio-visual dictionary. Finally, we ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematics

سال: 2023

ISSN: ['2227-7390']

DOI: https://doi.org/10.3390/math11071733